Towards Large-scale Non-taxonomic Relation Extraction: Estimating the Precision of Rote Extractors
نویسندگان
چکیده
In this paper, we describe a rote extractor that learns patterns for finding semantic relations in unrestricted text, with new procedures for pattern generalisation and scoring. An improved method for estimating the precision of the extracted patterns is presented. We show that our method approximates the precision values as evaluated by hand much better than the procedure traditionally used in rote extractors.
منابع مشابه
A New Method for Improving Computational Cost of Open Information Extraction Systems Using Log-Linear Model
Information extraction (IE) is a process of automatically providing a structured representation from an unstructured or semi-structured text. It is a long-standing challenge in natural language processing (NLP) which has been intensified by the increased volume of information and heterogeneity, and non-structured form of it. One of the core information extraction tasks is relation extraction wh...
متن کاملExploratory Relation Extraction in Large Text Corpora
In this paper, we propose and demonstrate Exploratory Relation Extraction (ERE), a novel approach to identifying and extracting relations from large text corpora based on user-driven and data-guided incremental exploration. We draw upon ideas from the information seeking paradigm of Exploratory Search (ES) to enable an exploration process in which users begin with a vaguely defined information ...
متن کاملType-Aware Distantly Supervised Relation Extraction with Linked Arguments
Distant supervision has become the leading method for training large-scale relation extractors, with nearly universal adoption in recent TAC knowledge-base population competitions. However, there are still many questions about the best way to learn such extractors. In this paper we investigate four orthogonal improvements: integrating named entity linking (NEL) and coreference resolution into a...
متن کاملLearning 5000 Relational Extractors
Many researchers are trying to use information extraction (IE) to create large-scale knowledge bases from natural language text on the Web. However, the primary approach (supervised learning of relation-specific extractors) requires manually-labeled training data for each relation and doesn’t scale to the thousands of relations encoded in Web text. This paper presents LUCHS, a self-supervised, ...
متن کاملBootstrapped Self Training for Knowledge Base Population
A central challenge in relation extraction is the lack of supervised training data. Pattern-based relation extractors suffer from low recall, whereas distant supervision yields noisy data which hurts precision. We propose bootstrapped selftraining to capture the benefits of both systems: the precision of patterns and the generalizability of trained models. We show that training on the output of...
متن کامل